Integration of multiple feature sets for reducing ambiguity in automatic speech recognition
نویسنده
چکیده
This thesis presents a method to investigate the extent to which articulatory based acoustic features can be exploited to reduce ambiguity in automatic speech recognition search. The method proposed is based on a lattice re-scoring paradigm implemented to integrate articulatory based features into automatic speech recognition systems. Time delay neural networks are trained as feature detectors to generate feature streams over which hidden Markov models (HMMs) are defined. These articulatory based HMMs are combined with HMMs defined over spectral energy based Mel frequency cepstrum coefficient (MFCC) acoustic features through a sequential lattice re-scoring procedure. The optimum phone strings are found by maximizing the log-linear combination of acoustic and language models likelihoods during recognition. The associated log-linear weights are estimated using a discriminative model combination approach. All the experiments are performed using the DARPA TIMIT speech database and the results are presented in terms of phone accuracies.
منابع مشابه
A Database for Automatic Persian Speech Emotion Recognition: Collection, Processing and Evaluation
Abstract Recent developments in robotics automation have motivated researchers to improve the efficiency of interactive systems by making a natural man-machine interaction. Since speech is the most popular method of communication, recognizing human emotions from speech signal becomes a challenging research topic known as Speech Emotion Recognition (SER). In this study, we propose a Persian em...
متن کاملImproving of Feature Selection in Speech Emotion Recognition Based-on Hybrid Evolutionary Algorithms
One of the important issues in speech emotion recognizing is selecting of appropriate feature sets in order to improve the detection rate and classification accuracy. In last studies researchers tried to select the appropriate features for classification by using the selecting and reducing the space of features methods, such as the Fisher and PCA. In this research, a hybrid evolutionary algorit...
متن کاملImproving the performance of MFCC for Persian robust speech recognition
The Mel Frequency cepstral coefficients are the most widely used feature in speech recognition but they are very sensitive to noise. In this paper to achieve a satisfactorily performance in Automatic Speech Recognition (ASR) applications we introduce a noise robust new set of MFCC vector estimated through following steps. First, spectral mean normalization is a pre-processing which applies to t...
متن کاملAllophone-based acoustic modeling for Persian phoneme recognition
Phoneme recognition is one of the fundamental phases of automatic speech recognition. Coarticulation which refers to the integration of sounds, is one of the important obstacles in phoneme recognition. In other words, each phone is influenced and changed by the characteristics of its neighbor phones, and coarticulation is responsible for most of these changes. The idea of modeling the effects o...
متن کاملDesigning and implementing a system for Automatic recognition of Persian letters by Lip-reading using image processing methods
For many years, speech has been the most natural and efficient means of information exchange for human beings. With the advancement of technology and the prevalence of computer usage, the design and production of speech recognition systems have been considered by researchers. Among this, lip-reading techniques encountered with many challenges for speech recognition, that one of the challenges b...
متن کامل